# Dr. M. Baron, Statistical Machine Learning class, STAT-427/627

# Jackknife and Bootstrap for Traffic Problems

One needs to estimate p, the frequency of days with 0 traffic accidents on a certain highway. The data are collected. During 40 days, there are 26 days with 0 accidents, 10 days with 1 accident, and 4 days with 2 accidents.

Statistician A estimates p by the sample proportion .

Statistician B argues that this method does not distinguish between the days with 1 accident and the days with 2 accidents, losing some valuable information. She suggests to model the number of accidents X by a Poisson distribution with parameter λ. Then p = P( X = 0 ) = e^-^λ_. Statistician B estimates λ with . Then is a new estimator of p, and it uses all the information contained in the data.

However, this estimator is biased. Both statisticians use Jackknife to reduce the bias, and together they obtain the jackknife estimator .

Now we have three competing estimators - , , and . Evaluate and rank them according to their bias and standard error?

Solution:

We can generate the data as follows

> X = c( rep(0,26), rep(1,10), rep(2,4) )

> X

[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2

Define the function P.poisson = exp( - sample mean) and apply Jackknife.

> P.poisson = function( X ){

+ return( exp(-mean(X)) ) }

> P.poisson(X)

[1] 0.6376282

> library(bootstrap)

> JK = jackknife(X,P.poisson)

> JK$jack.bias

[1] 0.003683256

> P.JK = P.poisson(X) - JK$jack.bias

> P.JK

[1] 0.6339449

Our three estimates are evaluated as , , and .

According to the bias… is unbiased; has bias = 0.0037, and has a lower bias than , because Jackknife reduces bias. Thus, according to the bias, the estimates are ranked as:

(1)

(2)

(3)

According to the standard deviation…

To use the bootstrap, define three functions for the sample proportion proposed by Statistician A, the estimator proposed by Statistician B, and for its jackknife version.

> P.hat = function( X, sample ){

+ return( mean(X[sample]==0) )}

> P.poisson = function( X, sample ){

+ return( exp(-mean(X[sample])) )}

> P.JK = function( X, sample ){

+ n = length(X)

+ P.i = rep(0,n)

+ for (i in 1:n){ P.i[i] = P.poisson(X[sample],-i) } # Leave one out

+ return( n*P.poisson(X[sample],) - (n-1)*mean(P.i) )} # Jackknife formula

Now, use bootstrap to evaluate performance of these estimators from the given data.

> library(boot)

> boot( X, P.hat, R=10000 )

original bias std. error

t1* 0.65 -0.0005225 0.07590366

> boot( X, P.poisson, R=10000 )

original bias std. error

t1* 0.6376282 0.004304599 0.06729368

> boot( X, P.JK, R=10000 )

original bias std. error

t1* 0.6339449 0.004085077 0.06780789

We knew that the first estimator is unbiased. However, it ignores some information, as it is correctly noticed in the problem. As a result, it has the highest standard error, among these three competing estimators. The second estimator has some bias, but it has a lower standard error. Its jackknife version makes some bias reduction, but it is at the expense of a little higher standard error.

These results may vary somewhat because they are based on random bootstrap samples.

Summary

According to the lower bias, the estimators are ranked as

(1) (best)

(2)

(3) (worst)

According to the lower standard deviation (standard error), the estimators are ranked as

(1) (best)

(2)

(3) (worst)